法国专利FR3081591A1 METHOD FOR PROCESSING A VIDEO IMAGE STREAM

专利PDF首页>>法国专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
A method of processing a video image stream to search for information, including detecting predefined objects and / or motion, comprising the steps of: a) providing at least one attention map in at least one space of positions and scales of at least one image of the video stream, b) select in this space points to be analyzed by making the selection depend on at least the values of the coefficients of the attention map at these points, at least a part of the points to be analyzed being selected by random selection with a probability of selection during the draw at a point depending on the value of the attention map at this point, a bias being introduced into the map to confer in all points a probability of selection not null, c) analyze the selected points to look for said information, d) update the attention card at least for the treatment of the next image, from at least as a result of the analysis performed in c), e) repeat the steps a) to d) for each new image of the video stream and / or for the current image on at least one different scale.
公开号:FR3081591A1
申请号:FR1854286
申请日:2018-05-23
公开日:2019-11-29
发明作者:Maxime Thiebaut；Vincent Despiegel；Dora CSILLAG
申请人:Idemia Identity and Security France SAS；
IPC主号:

专利说明:

The present invention relates to the processing of a video stream in order to analyze the images.
This involves, for example, extracting information from video recordings that is useful to investigators, allowing the identification of pedestrians or other objects.
With the increase in the number of cameras and the improvement in resolution, the volume of video data to be processed becomes considerable and their exploitation to find relevant information requires significant material and human resources.
Figure 1 illustrates a processing method of the prior art. Each image in the video stream undergoes detection processing which aims to identify predefined objects in the image, for example pedestrians or others.
The objective of the detection algorithm is to give detection results in terms of typically position in the image, for example x, y, and size, for example height, width. We can represent the space of positions and scales by a 4-upplet, for example x, y, width, height, but it is possible to generalize to more complex forms.
FIG. 2 shows two detection results in such a space, the first detection having the coordinates (xl, yl, widthl, heightl) therein, and the second (x2, y2, width2, height2)
A recognition analysis may possibly be carried out on these objects, in order for example to identify a face or to read the license plate of a vehicle.
To reduce processing times at the hardware level, it is possible to reduce the number of pixels analyzed by decimating the video stream, either spatially by resizing or cropping the images, or temporally by sampling the images, for example by processing a image every n images, n typically being between 2 and 5.
However, such decimation degrades the detection capacity and tends to increase false acceptances. In particular, resizing the image at the start of processing has an impact on the ability to detect small objects, while decimation of retaining only one image on has an impact on tracking and tracking performance. detection.
Application WO 2017/005930 A1 discloses a method for detecting objects, in particular pedestrians, by a processing operation comprising the scanning of an image by a sliding window of predetermined fixed size, linked to the detection scale. The scanning is carried out according to a predefined cycle. We first memorize each region of interest of a first subset that gave rise to the detection of an object. Then, the analysis is repeated for the following images on a second subset of regions of interest, consisting on the one hand of the regions of interest previously stored and the regions of interest contiguous to them, and on the other hand regions of interest obtained by moving the sliding window. Such a deterministic detection method is not completely satisfactory, since the course of the detection cycle may prove to be unfavorable for the rapid detection of objects appearing in new areas of the image. In addition, detection is performed on windows of imposed shape, which limits the detection capacity.
There is therefore a need to benefit from a solution making it possible to optimize the processing time and / or improve the detection performance, in particular by reducing the rate of false acceptances.
The invention meets this need thanks to a process for processing a stream of video images to search for information therein, in particular detecting predefined objects and / or a movement, comprising the steps consisting in:
a) provide at least one attention card in at least one space of the positions and scales of at least one image of the video stream,
b) select in this space points to be analyzed by making the selection depend at least on the values of the coefficients of the attention map at these points, at least part of the points to be analyzed being selected by random drawing with a probability of selection during the draw at a point depending on the value of the attention card at this point, a bias being introduced in the card to confer at all points a probability of selection not zero,
c) analyze the selected points to find said information,
d) update the attention card at least for the processing of the following image, at least from the result of the analysis carried out in c),
e) repeat steps a) to d) for each new image in the video stream and / or for the current image at least on a different scale.
The method according to the invention makes it possible to devote on average more time to the pixels of the image which have a greater probability of containing the information sought, regardless of the technique used for detection.
The attention card can be initialized by giving the same value for all the points, for a given detection scale, for example a value equal to the bias. The latter can make it possible to guarantee a detection rate equal to a detection rate excluding optimization, except for detection latency. This allows you to analyze all the regions of the image first, before focusing the search on certain areas. For a point where, due to the result of the draw and the value of the coefficient of the attention card at this point, no positive detection is carried out, the value of the coefficient may remain unchanged for this point on the scale detection concerned, and for example equal to the bias. On the other hand, when a positive detection is carried out at a point, the value of the coefficient at this point is updated and modified to take a value leading to a higher analysis frequency.
The bias, which consists in giving a non-zero value to each coefficient of the attention card, ensures that all the pixels end up being processed due to the random draw. The method spends a minimum of time in this way on pixels other than those which are caused by the corresponding values of the attention card to be processed most frequently. Thus, the bias leads to devote a certain amount of calculation time to each pixel.
The values of the attention card coefficients and in particular the choice of the value of the bias depend on the nature of the objects to be detected and on the manner in which the detection is carried out. The calculation of the values of the coefficients of the attention map is preferably carried out by learning from a database representative of the type of detection which it is sought to carry out, for example of the nature of the objects sought and / or movements on the image that one seeks to detect, this base being preferably chosen to maximize the detection performance while minimizing the computation times.
It is also possible to choose as a bias value a value which depends on parameters, for example which is adaptive as a function of the time and / or of the resources available or of the quality of detection that is desired.
For example, if you want a higher quality of detection, you can modify the value of the bias to increase the analysis frequency at any point.
The probabilistic approach of choosing the pixels analyzed according to the invention offers great freedom in the choice of the detection method. One can for example, if desired, apply an object detection method based on the analysis of regions of interest or a detection method based on a motion detection, not involving regions of interest.
Random sampling does not favor a particular area among those where one does not expect to find information a priori. Thus, in the event of a change occurring in such an area, it can be detected quickly.
This method is preferably implemented on several detection scales. The probability of finding sought information with the analysis of a given pixel can indeed vary with the size of the object that one seeks to detect. One can make the calculation of the coefficients of the attention map dependent on the result. at least one previous detection and / or the values of the coefficients of at least one previous attention card. For example, when updating the attention map, you can give a coefficient of the attention map at a point a value leading to an analysis frequency that is higher the closer this point is, in l space of positions and scales (x, y, height, width) of a positive detection. The notion of proximity is a function which is defined for each type of detection, and this function can be learned from representative data. For example, for a simultaneous vehicle and pedestrian detection algorithm, the concept of proximity is not the same for the vehicle and the pedestrian, because the expected speeds of the objects are different. It is possible to dynamically adapt the parameters defining the concept of proximity, depending for example on the measured speed of the objects and / or on a preliminary calibration of the camera.
It is also possible to introduce a statistical bias which is static in space, for example if one wishes to permanently favor a given area in the field of vision of the camera, for example in the center of it. Such a static bias is for example a function of the x, y coordinates, the height and the width of the object, which makes it possible to favor, in a given area of the field of view, for example in the center of the image, objects of a given size. The invention can be applied to the detection of objects, such as pedestrians for example.
As a variant, the invention applies to the detection of movements on an image.
It can be any type of object, for example pedestrian, animal, vehicle ...
Depending on the result of the detection for a given image at a given scale, at least one region of interest can be defined in this image at this scale, and the processing of at least one following image is updated. the attention map on this scale on the basis of this region of interest, in particular by bringing to a predefined value chosen as a function of the value of the bias all the points of the attention map corresponding to this region of interest to this scale.
The value of the attention map at a point can be given by the following formula, whatever the nature of the object and the corresponding detection algorithm, for example suitable for the detection of pedestrians or the detection of movement :
attention_card (t + l) = max (probabilitybais, filter_temporelfônction_proximity (exit_algo (i)) i _{<= t} )) (1)
In this expression, "attention_card (t + l)" designates the attention map, homogeneous to a probability map, at time t + 1, being calculated from data at time t or at previous times .
In an implementation example, a value of 1 at a point indicates maximum attention to the next image, while a value of 0 indicates zero attention.
The "max" function corresponds to the maximum function "proximity_function" designates a function which transforms an algorithm output into a probability map.
In an example of implementation, this function gives high values close to a detection. In this case, the maximum value of the attention card can be 1. It can also adapt to external parameters such as the speed of movement of objects or the configuration of the scene.
The expression "Output_algo (i)" designates the output of the detection algorithm at time i. It can be bounding boxes, segmentation, motion map, etc.
The "Eiltre_temporel" function designates a temporal filter whose objective is to merge several probability maps, typically by giving greater weight to near times (t, t-1, t-2, ...). This can make it possible to make a decision as to the choice to proceed or not to the analysis of a point which takes into account a certain history of detections at this point or nearby for example. In the context of object detection in particular , we can for example perform a first search on the image on a first scale, that is to say with x, y variables for 1 and h, which characterize the detection scale, fixed in space (x , y, l, h), where x and y are the coordinates in the image and 1 and h the width and height of the detection window, fixed for example at h and hi, then a second search on the image on a second scale, that is to say with variable x and y and 1 and h respectively fixed to other values h and hz different from h and hi. A positive detection makes it possible to define a region of interest for the processing of at least one following image. This region of interest can be at least the size of the detection. The state of the card coefficients relating to an area where a positive detection takes place can therefore be updated for the following image.
In an example of implementation and in the case of the detection of objects in particular, such as pedestrians for example, the coefficients of the attention map in this region of interest can take either an extreme value which forces the selection, for example the value 1, that is to say the analysis is forced at this point in the space of positions and scales, that is to say the value of the bias otherwise, which amounts to carrying out the detection at a frequency lower, the number of successive images without detection at a given point depending on the bias. For example, the lower this bias, the more we pass a large number of images in the video stream without analyzing the image at this point. The extreme value can be given at a point on the attention map for processing the image of rank N + 1 in the video stream if the point in question corresponds to a positive detection at image N or is sufficiently close spatially and temporally from a positive detection to an earlier image. In a variant, the value which forces the detection, for example the value 1 as above, is replaced by a close but not extreme value, so as to save resources. In this case, the frequency of analysis remains relatively high there in the region of interest but the analysis does not take place systematically in the region of interest at each iteration.
A binary mask per detection scale can be generated in step b) from the attention map, this binary mask being applied to at least one image of the flow to be analyzed in step c), the analysis s '' effecting only on the non-masked pixels, all the pixels of the mask preferably being initialized to the same value corresponding to the absence of masking
In accordance with an aspect of the invention defined above, a draw is made to select at least part of the points on which the analysis is carried out, in particular to determine on which pixels of the image located outside the areas of interest the analysis is carried out. This draw can take place for each coefficient of the attention card or only for those whose value is not extreme to the point of forcing the analysis at this point, and the result of the comparison of this draw with the value of the coefficient determines the value of the bit mask. By "drawing" is meant the fact of generating numbers between certain limits, for example between 0 and 1. This drawing is random. There is thus a random decimation of the video stream outside the areas of interest, and a random selection of the pixels analyzed, with a higher probability of analysis of the pixels in the area or areas of interest, due to the taking into account the values of the attention card in the creation of the mask.
As mentioned above, the attention map can be initialized by giving the same value for all pixels. This makes it possible to analyze all the regions of the image first, before concentrating the research then on the regions of interest. The values taken by the coefficients of the attention card depend on the result of the detection. The value assigned to a coefficient associated with a pixel on a detection scale may depend on the states of this pixel in the previous images and on the result of the detection. For example, the value of a coefficient of the attention map associated with a given pixel on a detection scale at a given instant can be all the more extreme, in particular high, that there has been a detection close to this pixel on the image, and / or that a non-thresholded (that is to say non-binary) confidence score of this pixel or in the vicinity of this pixel in the image, on a detection scale and at a time of detection, is high. This can be the case, for example, of motion detection, where one can work directly on a non-thresholded card, with a detection algorithm that works at the pixel level. This can also be the case for pedestrian detection, if we operate at the pixel level as well.
As mentioned above, the frequency of analysis with which the analysis is carried out outside of the region or regions of interest is controlled by the introduction of a bias in the attention map outside the or regions of interest. This bias corresponds to the value given by the attention map to each pixel of the image at a given detection scale. The presence of bias ensures that all the pixels in the image are analyzed after a certain latency, even if the probability that these pixels contain the information sought is low. Thus a new object entering the field of the camera will not be detected immediately, but will be detected on the following images.
The bias can thus be chosen so as to ensure an analysis of all the pixels outside of the region or regions of interest with a latency of m images, with m between 3 and 10 images, preferably equal to 5. Otherwise said, each pixel is analyzed on average at the latest every m images. A value of 5 is relatively transparent in terms of algorithmic detection performance or user experience in the case of pedestrian detection and tracking with a video at 25 frames / s. The frequency of analysis corresponds to the inverse of the latency time. In the regions of interest, the analysis frequency can be 1, i.e. a given pixel of these regions of interest is analyzed for each image of the video stream, or close to 1, c that is, a given pixel is analyzed on average every n images, with n close to 1, while outside the regions of interest, a given pixel is analyzed on average every k images, with k> n. For a pixel where, due to the result of the draw and the value of the coefficient of the attention card, no detection is carried out, the value of the coefficient may remain unchanged for this pixel and this detection scale. On the other hand, when a detection is carried out in a given zone, the state of the coefficients of the card relating to this zone is updated according to the characteristics of the detection.
When the detection is positive in at least two nearby regions of interest, the method preferably comprises the merging of these regions of interest and the corresponding updating of the associated coefficients of the attention map for the processing of at least a next image.
When the detection is positive at at least one point in the space of positions and scales for a given image, the method preferably comprises the generation of a region of interest enlarged compared to the dimensions of an analysis window. given by the scale at which the analysis is carried out, and the corresponding updating of the associated coefficients of the attention card for the processing of the following image. Each region of interest can thus be determined by taking an area of the enlarged image around information previously detected. We anticipate in this way the fact that a previously detected object is likely to move on the image, and we choose the size of the enlarged area so that it includes the possible displacement of the object on the 'picture. For example, if the information previously detected is a pedestrian, the enlarged area can be a rectangle that includes this pedestrian.
The enlarged area of interest can be determined by morphological dilation. In particular, at least one region of interest associated with a detected object can be determined by a morphological dilation of this detected object. The parameters of the morphological dilation can be fixed or variable, and in particular depend on the size of the region of interest. The size of a region of interest can be at least twice that of a previously detected object inside, or even three times or more.
The determination of the regions of interest from the detections carried out in the space of positions and scales can be done according to rules determined by learning, using a database learned from representative data, as mentioned. upper. The choice of the shape and dimensions of the above-mentioned enlarged zone can thus result from learning.
In the case of motion detection, each pixel of the image can be analyzed with a frequency which is determined by the values of the attention card.
The method according to the invention can be implemented at different image resolutions and independently for each of the pixels.
The attention map of step a) can be calculated from a probability map of pixels movement in the image. The attention map is for example calculated from this probability map and from a transfer function which controls the frequency of pixel analysis. The greater the probability of movement of a pixel on the image, the more the attention map will have for this pixel a value which leads to a frequent analysis of this pixel; conversely, a pixel which remains motionless on the images, because it corresponds for example to a fixed background, will have a low probability of movement and the attention map for this pixel will have a value chosen at a bias which leads to an analysis of this pixel at a low frequency but sufficient not to unduly degrade the detection capacity of the system.
The attention card can be calculated from the general formula (1) given above.
In a particular case, the attention map is calculated from the probability map of movement and a transfer function, for example as follows: carteattention = max (probabilitébiais, dilation _(m OVEMENT map)).
The dilation in question is for example the morphological dilation.
For example, where the dilation is zero, because we are too far from the object, we take as value for the associated coefficient of the attention card the value of the bias. Where the value resulting from the expansion is greater than the bias, this higher value is taken.
Another subject of the invention is a computer program product comprising a set of instruction lines recorded on a medium or downloadable from a server, for, when executed on a computer, causing the processing of images according to the method according to the invention as defined above.
The subject of the invention is also a video processing system for implementing the method according to the invention, comprising at least one camera or a video recording system generating a video stream and a system for processing this video stream, comprising computer means such as a dedicated processor or a microcomputer for processing images with a view to searching for given information therein, configured to implement the method according to the invention, as defined above. In particular, this processing can take place in real time or in deferred time.
The features of the invention set forth in connection with the method apply to the computer program product and the video processing system.
Brief description of the figures
The invention will be better understood on reading the detailed description which follows, of nonlimiting examples of implementation thereof, and on examining the appended drawing, in which:
- Figure 1 previously described, corresponds to the prior art,
FIG. 2 illustrates the concept of space of positions and scales on an image,
FIG. 3 is a block diagram illustrating an example of a method according to the invention,
FIGS. 4A and 4B are two examples of images extracted from a video stream, within the framework of the application of the invention to the detection of objects, on which the outline of the detected objects and of the zones has been traced interest,
FIG. 5 is another example of an image in the case of the application of the invention to motion detection, and
- Figure 6 shows the attention card corresponding to the image of Figure 5.
We will describe with reference to Figure 3 an example of a processing method according to the invention, for processing a video stream V.
This is for example a video stream coming from video surveillance cameras, and we want to search in the images of this stream for given information, for example finding an object with predefined characteristics, such as a pedestrian. Alternatively, it is a motion detection.
The method includes a detection engine 10 which provides a detection result 11. The detection engine can use different detection techniques, at different detection scales, depending on whether an object such as a pedestrian is being detected. for example or to perform motion detection. The detection engine 10 corresponds to an algorithm implemented within a microcomputer or a dedicated processor.
Among the detection techniques that can be used in the context of object detection in particular, one can cite the ACF (Agregated Channel Features), the DPM (Deformable Part Models), deep learning (Deep Learning), between other.
The article Fast Feature Pyramids for Object Detection by Piotr Dollar et al describes in SUBMISSIONS TO IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2014/09 examples of techniques that can be used.
The article Fast Human Detection for Intelligent Monitoring Using Surveillance Visible Sensors by Byoung Chui Ko et al published in Sensors 2014,14, 21247-21257 discloses a pedestrian detection by determining an optimal scale factor by use adaptive regions of interest.
The result of the detection, namely the presence of predefined objects in the images or the presence of a movement, may, depending on the applications, be sent in 12 to a higher level system, in order for example to process these objects with a view to identify them.
Whether for the detection of objects or of movement, the method according to the invention relies on the use of at least one attention card in a given space of positions and detection scales. By "attention card" we mean a matrix whose coefficients are associated with points in the space of positions and scales. The value of each coefficient is representative of the attention that the detection algorithm must pay to this point, in other words a higher attention where information is likely to be found taking into account the result of the analysis of the previously performed images, compared to the locations on the image where it is unlikely that the information sought is found in view of this result. This higher attention results in a higher frequency of analysis of the pixels concerned.
The method comprises a step of updating each attention card for a given detection scale in view of the result 11 of the detection on this scale, this updating can also be carried out if necessary taking into account the values previously taken by the card when processing the previous images.
All the coefficients of the attention card may have been initialized to the same value, for example a non-zero bias b between 0 and 1, limits excluded.
The update of the attention card in step 14 in FIG. 3 is carried out on the basis of learned data 15. The learning of this data can be carried out in various ways. It is a question of teaching the system where the probability of finding the information sought is greatest, given the nature and / or location of the detected objects and their movement, if any.
If we refer to the example of FIG. 4A, which relates to the detection of objects, we have materialized on the image the detected objects, in this case pedestrians. These objects are delimited by rectangles 16 whose long sides are vertical.
Updating the attention map involves updating the value of the attention map coefficients which in this example correspond to the pixels encompassed by these rectangles and which are analyzed.
Advantageously, in the example of pedestrian detection, we define enlarged regions of interest around the detected objects to take account of the fact that these objects move around the image, and thus ensure that in the following image the analysis focuses mainly on these regions.
The shape of the regions of broad interest can result from learning, and take into account the nature of the objects and / or their movement.
We can determine the regions of interest widened by subjecting the detected objects to a mathematical transformation, such as a morphological dilation for example.
In FIG. 4A, the contour 17 of the enlarged regions of interest is shown. If during the calculation of the extended regions of interest associated with the different detected objects, overlaps of zones or close zones are obtained, these zones can be merged into one, which is the case of the zone located on the right in FIG. 4A . We can see that each region of enlarged interest occupies an area equal to several times that of the object or objects contained inside, for example at least 3 times.
Figure 4B shows an image from the same camera at a different time. It can be seen that the enlarged regions of interest remain centered on the pedestrians detected.
The attention card has its coefficients in the extended regions of interest updated. We give a higher value to a coefficient to translate a higher probability that the pixel associated with this coefficient of the attention map contains information sought. All the coefficients of the attention map corresponding to regions of interest can take, for example, an extreme value c, for example maximum and for example equal to 1, to force detection at these points.
Several attention maps are thus updated after processing each image of the stream, since there is in the example considered one map per detection scale.
Then we make sure that we analyze the pixels located in the areas of interest more often than outside those areas.
However, we regularly come to observe outside the regions of interest, to detect new objects that would appear in them.
For this, a random draw 20 is carried out for each detection scale, as illustrated in FIG. 3, and on the basis of this draw and the attention card, a tone mask 21 is generated which will determine the areas where s 'will perform the detection, all the pixels of this mask being in this example initially at 1 to ensure that the initial detection relates to all the pixels of the image.
For example, a random draw is made between 0 and 1 and Ton compares the value of this draw to the value of the attention card at one point. Assuming for example that the bias b is 0.2, that the value of the attention map coefficients in the regions of interest is maximum and is worth c = 1 the binary mask takes the value 1 as soon as the draw is greater than the value of the coefficient of the attention map, which implies that Ton analyzes the corresponding pixel of the image in step 10. For example, assuming a printout equal to 0.5, for a coefficient of the map corresponding to a pixel located outside an area of interest, equal to 0.2, the mask takes the value 0 because the value of the coefficient is less than the draw; the corresponding pixel of the image is not analyzed in step 10; for a draw equal to 0.1, the pixel is analyzed because the value of the coefficient is greater than the draw. For an attention map coefficient corresponding to a pixel located in an area of interest, the printout is always less than 1 and the pixel will always be analyzed in step 10. A pixel located outside an area of interest will therefore lead to a binary mask which statistically will more often take the value 0 than a pixel located in an area of interest. Thus, the pixels located in the regions of interest will be analyzed on average more frequently than the others. The draw can be made for all pixels, but the decision depends on the attention card. The bias guarantees that we do not lose in detection. The value of the bias b conditions the latency, that is to say the number of images which will be analyzed on average without a given pixel located outside an area of interest being analyzed. For example, this latency is around five in the case of pedestrian detection for a video providing 25 frames / s; this means that in the image area corresponding to the lawn at the bottom left in FIGS. 4A and 4B, a pixel is analyzed on average only every 5 images; we understand that we gain in efficiency in the treatment, since we avoid an unnecessary analysis in detection step 10 in an area where it is unlikely that a pedestrian will move, the analysis focusing so automatic on the regions where the probability is the highest to detect pedestrians from one image to another.
When the method is applied to motion detection on the image, the calculation of the coefficients of the attention map takes account of a motion probability map, as illustrated in FIG. 5. In this figure we see objects 16 constituted by two moving vehicles appearing on the image. FIG. 6 represents the motion probability map, calculated from several previous images of the video stream, from the response of each of the pixels. We see that the probability of movement detected is high in vehicles, and zero elsewhere.
The attention map can be calculated from this probability of movement map and a transfer function, for example as follows:
carteattention = max (probabilitébiais, dilation _(m OVEMENT card))
The dilation in question is for example the morphological dilation.
Where the expansion is zero, because we are too far from the object, we take the value of the bias of the attention map as the value of the bias b. Where the value resulting from the expansion is greater than the bias b, we take this greater value.
Of course, the invention is not limited to the examples which have just been described.
In particular, the invention can be applied to video streams other than from surveillance cameras, for example a camera fitted to a vehicle for the purpose of avoiding pedestrians.

权利要求:
Claims (16)
[1" id="c-fr-0001]
1. Method for processing a stream of video images to search for information therein, in particular detecting predefined objects and / or a movement, comprising the steps consisting in:
a) provide at least one attention card in at least one space of the positions and scales of at least one image of the video stream,
b) select in this space points to be analyzed by making the selection depend at least on the values of the coefficients of the attention map at these points, at least part of the points to be analyzed being selected by random drawing with a probability of selection during of the draw at a point depending on the value of the attention card at this point, a bias being introduced in the card to confer at all points a probability of selection not zero,
c) analyze the selected points to find said information,
d) update the attention card at least for the processing of the following image, based at least on the result of the analysis carried out in c),
e) repeat steps a) to d) for each new image in the video stream and / or for the current image at least on a different scale.
[2" id="c-fr-0002]
2. Method according to claim 1, the attention card being initialized by giving the same value for all the points, for a given detection scale, in particular a value equal to the bias.
[3" id="c-fr-0003]
3. Method according to one of claims 1 and 2, in which the calculation of the coefficients of the attention map is made to depend on the result of at least one previous detection and / or on the values of the coefficients of at least one map. of previous attention.
[4" id="c-fr-0004]
4. Method according to claim 3, in which, when updating the attention map, a value of the attention map coefficient at a point is given a value all the more extreme, in particular high, that this point is close, in the space of positions and scales, to positive detection.
[5" id="c-fr-0005]
5. Method according to any one of the preceding claims, being applied to the detection of objects, in particular pedestrians.
[6" id="c-fr-0006]
6. The method of claim 5, the coefficients of the attention card taking either an extreme value which forces the selection, in particular in each region of interest, or the value of the bias otherwise.
[7" id="c-fr-0007]
7. Method according to one of claims 5 and 6, a binary mask per detection scale being generated in step b) from the attention card for this detection scale, this binary mask being applied to at least an image of the stream to be analyzed in step c), the analysis being carried out on the only unmasked pixels, all the pixels of the mask preferably being initialized to the same value corresponding to the absence of masking.
[8" id="c-fr-0008]
8. Method according to any one of claims 5 to 7, in which according to the result of the detection for a given image at a given scale, at least one region of interest is defined in this image at this scale, and updates for the processing of at least one following image the attention map at this scale on the basis of this region of interest, in particular by bringing to a value greater than the bias all the pixels of this region of interest to this scale.
[9" id="c-fr-0009]
9. The method of claim 8, the detection being positive in at least two regions of interest close, the method comprising the merging of these regions of interest and the corresponding updating of the associated coefficients of the attention map for the processing of at least one following image.
[10" id="c-fr-0010]
10. Method according to any one of claims 5 to 9, the detection being positive at at least one point in the space of positions and scales for a given image, the method comprising the generation of an enlarged region of interest. with respect to the dimensions of an analysis window given by the scale at which the analysis is carried out, and the corresponding updating of the associated coefficients of the attention map for the processing of the following image.
[11" id="c-fr-0011]
11. Method according to claim 10,1a region of enlarged interest being determined by morphological dilation.
[12" id="c-fr-0012]
12. The method of claim 11, the parameters of the morphological dilation being fixed.
[13" id="c-fr-0013]
13. The method of claim 11, the parameters of the morphological expansion being dynamic, in particular depending on the size of the region of interest or the speed of movement of the object.
[14" id="c-fr-0014]
14. Method according to any one of the preceding claims, being applied to motion detection.
[15" id="c-fr-0015]
15. Method according to any one of the preceding claims, the attention card being calculated according to the following formula: attention_card (t + l) = max
5 (probabilitybias, temporal_filter (proximity_function (algo_output (i)) i <= t)) (1)
[16" id="c-fr-0016]
16. Computer program product comprising a set of instruction lines recorded on a medium or downloadable from a server, for, when executed on a computer, causing image processing according to the method as defined in one any of the preceding claims.

类似技术:

公开号 | 公开日 | 专利标题

CA2859900C|2019-08-13|Method of estimating optical flow on the basis of an asynchronous light sensor

US7756296B2|2010-07-13|Method for tracking objects in videos using forward and backward tracking

KR100860988B1|2008-09-30|Method and apparatus for object detection in sequences

EP0368747B1|1995-07-19|Movement estimation method for at least one target in a picture string, and device for carrying out this method

EP3572976A1|2019-11-27|Method for processing a video image stream

FR2906387A1|2008-03-28|METHOD AND DEVICE FOR SELECTING IMAGES IN A SEQUENCE OF IMAGES OF IRIS RECEIVED IN CONTINUOUS FLOW

EP1986126B1|2010-08-25|Method and device for locating a human iris in an image

EP0492724B1|1998-03-18|Motion extraction process comprising the difference image forming and a three dimensional filtering

EP2930659B1|2016-12-21|Method for detecting points of interest in a digital image

EP3200153A1|2017-08-02|Method for detecting targets on the ground and in motion, in a video stream acquired with an airborne camera

FR2947656A1|2011-01-07|METHOD FOR DETECTING AN OBSTACLE FOR A MOTOR VEHICLE

BE1026095B1|2020-11-09|Image processing device

EP3271869B1|2019-10-16|Method for processing an asynchronous signal

EP0961227B1|2003-07-30|Method of detecting the relative depth between two objects in a scene from a pair of images taken at different views

Baldwin et al.2019|Inceptive event time-surfaces for object classification using neuromorphic cameras

EP1522951A1|2005-04-13|Determination of text-discriminating characteristics in digital images

WO2004013802A2|2004-02-12|Method and system for automatically locating text areas in an image

Chandrasekhar et al.2011|A survey of techniques for background subtraction and traffic analysis on surveillance video

EP2943935B1|2017-03-08|Estimation of the movement of an image

FR3038760A1|2017-01-13|DETECTION OF OBJECTS BY PROCESSING IMAGES

EP0530088A1|1993-03-03|Method of detection and tracking of moving objects by analysis of sequences of images

Hao et al.2011|A temporal-spatial background modeling of dynamic scenes

Sun et al.2004|From GMM to HGMM: an approach in moving object detection

FR3108193A1|2021-09-17|Inspection process for a plate and associated devices

CN112329793A|2021-02-05|Significance detection method based on structure self-adaption and scale self-adaption receptive fields

同族专利:

公开号 | 公开日

EP3572976A1|2019-11-27|

AU2019203516A1|2019-12-12|

US20190362183A1|2019-11-28|

US10867211B2|2020-12-15|

FR3081591B1|2020-07-31|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US8363939B1|2006-10-06|2013-01-29|Hrl Laboratories, Llc|Visual attention and segmentation system|

US9740949B1|2007-06-14|2017-08-22|Hrl Laboratories, Llc|System and method for detection of objects of interest in imagery|

US20100086200A1|2008-10-03|2010-04-08|3M Innovative Properties Company|Systems and methods for multi-perspective scene analysis|

US20130084013A1|2011-09-29|2013-04-04|Hao Tang|System and method for saliency map generation|

WO2016156236A1|2015-03-31|2016-10-06|Sony Corporation|Method and electronic device|

US20020154833A1|2001-03-08|2002-10-24|Christof Koch|Computation of intrinsic perceptual saliency in visual environments, and applications|

US20050047647A1|2003-06-10|2005-03-03|Ueli Rutishauser|System and method for attentional selection|

US7471827B2|2003-10-16|2008-12-30|Microsoft Corporation|Automatic browsing path generation to present image areas with high attention value as a function of space and time|

GB2519620B|2013-10-23|2015-12-30|Imagination Tech Ltd|Skin colour probability map|JP2020031276A|2018-08-20|2020-02-27|キヤノン株式会社|Information processing apparatus, information processing method, and program|

CN111488876B|2020-06-28|2020-10-23|平安国际智慧城市科技股份有限公司|License plate recognition method, device, equipment and medium based on artificial intelligence|

法律状态:
2019-04-18| PLFP| Fee payment|Year of fee payment: 2 |

2019-11-29| PLSC| Publication of the preliminary search report|Effective date: 20191129 |

2020-04-22| PLFP| Fee payment|Year of fee payment: 3 |

2021-04-21| PLFP| Fee payment|Year of fee payment: 4 |

优先权:

申请号 | 申请日 | 专利标题

FR1854286|2018-05-23|

FR1854286A|FR3081591B1|2018-05-23|2018-05-23|PROCESS FOR PROCESSING A STREAM OF VIDEO IMAGES|FR1854286A| FR3081591B1|2018-05-23|2018-05-23|PROCESS FOR PROCESSING A STREAM OF VIDEO IMAGES|

EP19170338.8A| EP3572976A1|2018-05-23|2019-04-18|Method for processing a video image stream|

AU2019203516A| AU2019203516A1|2018-05-23|2019-05-20|Method for Processing A Stream of Video Images|

US16/420,960| US10867211B2|2018-05-23|2019-05-23|Method for processing a stream of video images|

[返回顶部]